The Dipping Phenomenon

نویسندگان

  • Marco Loog
  • Robert P. W. Duin
چکیده

One typically expects classifiers to demonstrate improved performance with increasing training set sizes or at least to obtain their best performance in case one has an infinite number of training samples at ones’s disposal. We demonstrate, however, that there are classification problems on which particular classifiers attain their optimum performance at a training set size which is finite. Whether or not this phenomenon, which we term dipping, can be observed depends on the choice of classifier in relation to the underlying class distributions. We give some simple examples, for a few classifiers, that illustrate how the dipping phenomenon can occur. Additionally, we speculate about what generally is needed for dipping to emerge. What is clear is that this kind of learning curve behavior does not emerge due to mere chance and that the pattern recognition practitioner ought to take note of it. 1 On Learning Curves and Peaking The analysis of learning curves, which describe how a classifier’s error rate behaves under different training set sizes, is an integral part of almost any proper investigation into novel classification techniques or unexplored classification problems [7]. Though sometimes interest goes only to its asymptotics [9], the learning curve is especially informative in the comparison of two or more classifiers when considering the whole range of training set sizes. It indicates at what samples sizes the one classifier may be preferable over the other for a particular type of problem. Also, by means of extrapolation, the curve may give us some clue on how many additional samples may be needed in a real-world problem to reach a particular error rate. Such analyses are readily impossible on the basis of a point estimate as, for example, obtained by means of leave one out cross-validation on the whole data set at hand. The learning curve one typically expects to observe falls off monotonically with increasing training set size (see Figure 1). The rate of decrease depends on the particular problem considered and the complexity of the classifier employed. Such behavior can indeed be demonstrated in certain settings in which the classifier selected typically fits the underlying data assumptions well, see for instance [1,10]. In a similar spirit, various bounds on learning curves also show monotonic decrease for the expected true error rate with increasing training set sizes [5,16]. G.L. Gimel’ farb et al. (Eds.): SSPR & SPR 2012, LNCS 7626, pp. 310–317, 2012. c © Springer-Verlag Berlin Heidelberg 2012 The Dipping Phenomenon 311 training size er ro r ra te Fig. 1. An idealized learning curve in which the error rate drops monotonically with an increasing training set size That such monotonic behavior can, however, not always be guaranteed has already been known at least since the mid nineties. Both Opper and Kinzel [12] and Duin [4] describe what is nowadays referred to in pattern recognition as the peaking phenomenon for learning curves: the error rate attains a local maximum that does not coincide with the smallest training sample size considered. This phenomenon has been described and investigated, for instance, for the Fisher discriminant classifier [4,13,14], for particular perceptron rules [12,11], and for lasso regression [8]. The naming of this phenomenon alludes to the peaking phenomenon for increasing feature sizes (as opposed to increasing training set sizes, which this paper is concerned with) as originally identified by Hughes [6] in the 1960s. Hughes’ phenomenon for such feature curves shows that, for a fixed training sample size, the error initially drops but beyond a certain dimensionality typically starts to rise again. On the basis of what we know about peaking, we may adjust our expectation about learning curves and speculate classifiers to at least obtain their best performance when an infinite number of training samples is used. But also this turns out to be false hope as this work demonstrates. It appears there are classification problems on which particular classifiers attain their optimal performance at a training set size which is finite. In contrast with peaking, we term this phenomenon dipping as it concerns a minimum in the learning curve, in fact, a non-asymptotic, global minimum. The next three section of the paper, Sections 2, 3, and 4, give some simple examples, for three artificial classification problems in combination with specific classifiers, which demonstrate how the dipping phenomenon emerges. Though artificial, the examples clearly illustrate that this kind of learning curve behavior does not merely emerge due to chance, e.g. due some unfortunate draw of training data, but that it is an issue structurally present in particular problem-classifier combinations. The final section, Section 5, speculates on what 312 M. Loog and R.P.W. Duin generally is needed for dipping. It also offers some further discussions and concludes this contribution. 2 Basic Dipping for Linear Classifiers Consider a two-class classification problem consisting of one Gaussian distribution and one mixture of two Gaussian distributions (Figure 2). The Gaussians of the second class appear on either side of the Gaussian of the first class. A perfectly symmetric situation is considered here: there is symmetry in the overall distribution and the class priors are equal. It should be stressed, however, that this perfect symmetry is definitely not needed to observe a dipping behavior, just like there is no need to stick to Gaussian distributions. This configuration, however, enables us to easily explain why dipping occurs. −2 −1 0 1 2 3 4 0.2 0.4 0.6 0.8 1 1.2 1.4 Fig. 2. Distribution of two-class data used to illustrate basic dipping Let us consider what happens when we would make an expected learning curve for the nearest mean classifier (NMC, [3]). In the case of large total training set sizes, both means will be virtually on top of each other and the expected classification error will reach a worst case performance of 0.5. If, however, we go to smaller and smaller sample sizes, these means will in expectation be further and further apart due to their difference in variance. In the extreme case in which we have one observation from both classes, the one mean will be around the mode of class one and the other will be near one of the two modes of class two. Though one will still have means that lead to an error rate of about 0.5, chances are very slim. There will, however, be many configurations that both classify the first class and one lobe of the second class more or less correctly, which gives an expected error of around 0.25 as only the second lobe of the second class gets misclassified. In conclusion, the smaller the sample size is the higher the probability is that the NMC delivers a performance considerably better than chance. Figure 3 gives The Dipping Phenomenon 313 10 1 10 2 10 3 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 training size er ro r ra te

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Photocatalytic Activity of Fe Doped TiO2 Thin Film Prepared by sol-gel hot dip-coating

The application of Fe–TiO2 photocatalysis using sol–gel method by hot–dipping technique was investigated. Then, the influences of fabrication parameters, molar ratios of Fe to TiO2, the sol temperature, poly ethylene glycol (PEG) content and the number of dipping cycles on the photocatalytic activity in visible light region were mainly studied. The experimental results revealed the sample with ...

متن کامل

Can psychological factors account for a lack of nocturnal blood pressure dipping?

BACKGROUND In healthy individuals, blood pressure (BP) decreases, or "dips", during sleep. Ethnicity and high daytime blood pressure level are known markers of nondipping status. The literature on psychological markers of nondipping is scant but suggests that anger/hostility and chronic stress may be contributors to nondipping. PURPOSE We have investigated this phenomenon in drug-free hyperte...

متن کامل

Nocturnal blood pressure dipping in the hypertension of autonomic failure.

Blood pressure (BP) normally decreases during the night. Absence of this phenomenon (nondipping) is associated with increased cardiovascular risk. Altered autonomic and endocrine circadian rhythms are suspected to play a role. Patients with peripheral autonomic failure offer a unique opportunity to study this phenomenon, because approximately 50% develop supine hypertension despite very low aut...

متن کامل

Nervous System Nocturnal Blood Pressure Dipping in the Hypertension of Autonomic Failure

Blood pressure (BP) normally decreases during the night. Absence of this phenomenon (nondipping) is associated with increased cardiovascular risk. Altered autonomic and endocrine circadian rhythms are suspected to play a role. Patients with peripheral autonomic failure offer a unique opportunity to study this phenomenon, because 50% develop supine hypertension despite very low autonomic functio...

متن کامل

Orthostatic hypotension is a more robust predictor of cardiovascular events than nighttime reverse dipping in elderly.

Aims of the study were to assess in an elderly population the prevalences of orthostatic hypotension at different times after standing and of nighttime reverse dipping on ambulatory blood pressure monitoring, as well as their interrelationships and relative prognostic power for incident cardiovascular events. The study population consisted of 374 patients (225 women), aged 70.2+/-8.5 years, reg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012